A Stochastic Sub-Gradient Mirror Descent Algorithm for Non-Smooth and Strongly Convex Functions
نویسندگان
چکیده
منابع مشابه
Mirror descent in non-convex stochastic programming
In this paper, we examine a class of nonconvex stochastic optimization problems which we call variationally coherent, and which properly includes all quasi-convex programs. In view of solving such problems, we focus on the widely used stochastic mirror descent (SMD) family of algorithms, and we establish that the method’s last iterate converges with probability 1. We further introduce a localiz...
متن کاملEfficient Stochastic Gradient Descent for Strongly Convex Optimization
We motivate this study from a recent work on a stochastic gradient descent (SGD) method with only one projection (Mahdavi et al., 2012), which aims at alleviating the computational bottleneck of the standard SGD method in performing the projection at each iteration, and enjoys an O(log T/T ) convergence rate for strongly convex optimization. In this paper, we make further contributions along th...
متن کاملMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...
متن کاملStochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).
متن کاملConvergence Rate of Sign Stochastic Gradient Descent for Non-convex Functions
The sign stochastic gradient descent method (signSGD) utilises only the sign of the stochastic gradient in its updates. For deep networks, this one-bit quantisation has surprisingly little impact on convergence speed or generalisation performance compared to SGD. Since signSGD is effectively compressing the gradients, it is very relevant for distributed optimisation where gradients need to be a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pure Mathematics
سال: 2018
ISSN: 2160-7583,2160-7605
DOI: 10.12677/pm.2018.83028